-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Document upcoming MXNet training script format #390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document upcoming MXNet training script format #390
Conversation
This reverts commit 891bf59.
src/sagemaker/mxnet/README.rst
Outdated
3. Save the model | ||
|
||
Hyperparameters will be passed as command-line arguments to your training script. | ||
In addition, the locations for finding input data and saving the model and output data will need to be defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pass these in the container with environment variables. I don't think users can override these in their training script. They can specify the s3 locations for these. But since we handle the data downloading and mounting the shared partition for the container they have to use the paths in the environment variables. I think we should explain this a little better and put a reference to the environment variables in sagemaker-containers. There is a list in the README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we store the locations in environment variables, but we do rely on the user to read those environment variables in their script, so they do have to define variables with those locations. I can add explanation about the environment variables though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
In addition, you need to define the locations of where to get the input data and where to save the model artifacts and output data.
src/sagemaker/mxnet/README.rst
Outdated
parser.add_argument('--learning-rate', type=float, default=0.1) | ||
|
||
# input data and model directories | ||
parser.add_argument('--model-dir', type=str, default='opt/ml/model') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the environment variables to set the args here as well.
src/sagemaker/mxnet/README.rst
Outdated
@@ -599,13 +599,16 @@ The code executed from your main guard needs to: | |||
3. Save the model | |||
|
|||
Hyperparameters will be passed as command-line arguments to your training script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest replacing "will be" with "are".
src/sagemaker/mxnet/README.rst
Outdated
@@ -599,13 +599,16 @@ The code executed from your main guard needs to: | |||
3. Save the model | |||
|
|||
Hyperparameters will be passed as command-line arguments to your training script. | |||
In addition, the locations for finding input data and saving the model and output data will need to be defined. | |||
In addition, the locations for finding input data and saving the model and output data will be provided as environment variables rather than as arguments to a function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
In addition, you specify the locations of input data and where to save the model artifacts and output data as environment variables in the container, rather than as arguments to a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the container provides this info to the user, not the other way around. Would this be clearer?
In addition, the container will define the locations of input data and where to save the model artifacts and output data as environment variables rather than passing that information through train
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor change:
...rather than passing that information as arguments to the train
function.
2. Initiate training | ||
3. Save the model | ||
|
||
Hyperparameters will be passed as command-line arguments to your training script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest replacing "will be" with "are"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used future tense because these instructions are going to be live awhile before the changes themselves are released - I'm afraid present tense might be too confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
src/sagemaker/mxnet/README.rst
Outdated
args, _ = parser.parse_known_args() | ||
|
||
The code in the main guard should also take care of training and saving the model. | ||
This can be as simple as just calling the methods used with the previous training script format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest:
This can be as simple as calling the train
and save
methods used in the previous training script format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
The next major release of MXNet will change the training script format. This README change documents the changes needed by the user to adjust to the new format. This is currently just a warning as the new format is not out yet. The warning is meant to help users plan for the upcoming change.
Description of changes:
This change adds a warning to the README about the upcoming MXNet training script format, in addition to basic instructions of what kind of changes will be needed once the format is changed.
I didn't use a ReST admonition for the warning because GitHub won't render them.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.